Hierarchical deep multi-modal network for medical visual question answering
نویسندگان
چکیده
Visual Question Answering in Medical domain (VQA-Med) plays an important role providing medical assistance to the end-users. These users are expected raise either a straightforward question with Yes/No answer or challenging that requires detailed and descriptive answer. The existing techniques VQA-Med fail distinguish between different types sometimes complicates simpler problems, over-simplifies complicated ones. It is certainly true for types, several distinct systems can lead confusion discomfort To address this issue, we propose hierarchical deep multi-modal network analyzes classifies end-user questions/queries then incorporates query-specific approach prediction. We refer our proposed as Hierarchical Segregation based Answering, short HQS-VQA. Our contributions three-fold, viz. firstly, segregation (QS) technique VQA-Med; secondly, integrate QS model neural generate proper answers queries related images; thirdly, study impact of Medical-VQA by comparing performance without QS. evaluate on two benchmark datasets, RAD CLEF18. Experimental results show HQS-VQA outperforms baseline models significant margins. also conduct quantitative qualitative analysis obtained discover potential causes errors their solutions.
منابع مشابه
Deep Learning for Visual Question Answering
This project deals with the problem of Visual Question Answering (VQA). We develop neural network-based models to answer open-ended questions that are grounded in images. We used the newly released VQA dataset (with about 750K questions) to carry out our experiments. Our model makes use of two popular neural network architecture: Convolutional Neural Nets (CNN) and Long Short Term Memory Networ...
متن کاملHierarchical Question-Image Co-Attention for Visual Question Answering
A number of recent works have proposed attention models for Visual Question Answering (VQA) that generate spatial maps highlighting image regions relevant to answering the question. In this paper, we argue that in addition to modeling “where to look” or visual attention, it is equally important to model “what words to listen to” or question attention. We present a novel co-attention model for V...
متن کاملMulti-Modal Question-Answering: Questions without Keyboards
This paper describes our work to allow players in a virtual world to pose questions without relying on textual input. Our approach is to create enhanced virtual photographs by annotating them with semantic information from the 3D environment’s scene graph. The player can then use these annotated photos to interact with inhabitants of the world through automatically generated queries that are gu...
متن کاملDual Attention Network for Visual Question Answering
Visual Question Answering (VQA) is a popular research problem that involves inferring answers to natural language questions about a given visual scene. Recent neural network approaches to VQA use attention to select relevant image features based on the question. In this paper, we propose a novel Dual Attention Network (DAN) that not only attends to image features, but also to question features....
متن کاملVisual Question Answering using Deep Learning
Multimodal learning between images and language has gained attention of researchers over the past few years. Using recent deep learning techniques, specifically end-to-end trainable artificial neural networks, performance in tasks like automatic image captioning, bidirectional sentence and image retrieval have been significantly improved. Recently, as a further exploration of present artificial...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Expert Systems With Applications
سال: 2021
ISSN: ['1873-6793', '0957-4174']
DOI: https://doi.org/10.1016/j.eswa.2020.113993